NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication

https://doi.org/10.1162/99608f92.25982dcf

Willis, Craig; Stodden, Victoria (February 2021, Harvard Data Science Review)

Full Text Available
The data science life cycle: a disciplined approach to advancing data science as a science

https://doi.org/10.1145/3360646

Stodden, Victoria (June 2020, Communications of the ACM)

A cycle that traces ways to define the landscape of data science.
more » « less
Full Text Available
Domain-Specific Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications

https://doi.org/10.1109/ICSE43902.2021.00018

Zhang, Peilun; Jiang, Yanjie; Wei, Anjiang; Stodden, Victoria; Marinov, Darko; Shi, August (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE))
null (Ed.)
Full Text Available
Beyond Open Data: A Model for Linking Digital Artifacts to Enable Reproducibility of Scientific Claims

https://doi.org/10.1145/3391800.3398172

Stodden, Victoria (January 2020, Third International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'20))

The last few years has seen a substantial push toward “Open Data” by policy makers, researchers, archivists, and even the public. This article postulates that the value of data is not intrinsic but instead derives from its ability to produce knowledge; the extraction of which from data is not deterministic. The value of data is realized through a focus on the reproducibility of the findings from the data, which acknowledges the complexity of the leap from data to knowledge, and the inextricable interrelationships between data, software, computational environments and cyberinfrastructure, and knowledge. Modern information archiving practices have a long history and were shaped in a pre-digital world comprised of physical objects such as books, monographs, film, paper, and other physical artifacts. This article argues that “data,” the modern collection of digital bits representing empirical measurements, is a wholly new entity and not a digital analog to any physical object. It further argues that a focus on the interrelationships between digital artifacts and their unique properties, instead of Open Data alone, will instead produce an augmented and more useful understanding of knowledge when it is derived from digital data. Data-derived knowledge, represented by claims in the scholarly record, must persistently link to immutable versions of the digital artifacts from which it was derived, including 1) any data, 2) software that allows access to the data and the regeneration of those claims that rely on the version of the data, and 3) computational environment information including input parameters, function invocation sequences, and resource details. In this sense the epistemological gap between data and extracted knowledge can be closed. Datasets and software are often subject to change and revision, sometimes even with high velocity, and such changes imply new versions with new unique identifiers. We propose considering knowledge, rather than data in isolation, with a schematic model representing the interconnectedness of datasets, software, and computational information upon which its derivation depends. Capturing the interconnectedness of these digital artifacts, and their relationship to the knowledge they generate, is essential for supporting the reproducibility, transparency, and cognitive tractability of scientific claims derived from digital data.
more » « less
Full Text Available
Understanding Reproducibility and Characteristics of Flaky Tests Through Test Reruns in Java Projects

https://doi.org/10.1109/ISSRE5003.2020.00045

Lam, Wing; Winter, Stefan; Astorga, Angello; Stodden, Victoria; Marinov, Darko (October 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE))
null (Ed.)
Full Text Available
Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context

https://doi.org/https://doi.org/10.1145/3322790.3330595

Krafczyk, Matthew; Shi, August; Bhaskar, Adhithya; Marinov, Darko; Stodden, Victoria (June 2019, 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS’19))

Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.
more » « less
Full Text Available
Initial Thoughts on Cybersecurity And Reproducibility

https://doi.org/10.1145/3322790.3330593

Deelman, Ewa; Stodden, Victoria; Taufer, Michela; Welch, Von (January 2019, 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'19))

Cybersecurity, which serves to protect computer systems and data from malicious and accidental abuse and changes, both supports and challenges the reproducibility of computational science. This position paper explores a research agenda by enumerating a set of two types of challenges that emerge at the intersection of cybersecurity and reproducibility: challenges that cybersecurity has in supporting the reproducibility of computational science, and challenges cybersecurity creates for reproducibility of computational science.
more » « less
Full Text Available
Open access to research artifacts: Implementing the next generation data management plan

https://doi.org/10.1002/pra2.51

Stodden, Victoria; Ferrini, Vicki; Gabanyi, Margaret; Lehnert, Kerstin; Morton, John; Berman, Helen (January 2019, Proceedings of the Association for Information Science and Technology)
null (Ed.)
Full Text Available
Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis

https://doi.org/10.1016/j.obhdp.2021.02.003

Schweinsberg, Martin; Feldman, Michael; Staub, Nicola; van den Akker, Olmo R.; van Aert, Robbie C.M.; van Assen, Marcel A.L.M.; Liu, Yang; Althoff, Tim; Heer, Jeffrey; Kale, Alex; et al (July 2021, Organizational Behavior and Human Decision Processes)
null (Ed.)
Full Text Available

Search for: All records